Difference in Difference models

Difference in differences: basic intuition

  • Slightly tweaking the traditional counterfactual, DiD asks: “what would happen to the trend for this unit had it never received the treatment?”

    • How would your rate of growth change if you ate more vegetables?

    • What would happen to inflation if the federal reserve lowered interest rates?

    • How would the Arab Spring have unfolded if participants lacked access to cell phones and social media?

Difference in differences model

  • Probably the oldest non-experimental method of causal inference (likely dates back to 1855)

  • Units must be observed before and after the “treatment”, so most commonly applied to panel data.

  • If assumptions are met, can control for both observed and unobserved confounding.

John Snow and the London Cholera epidemic

  • 1854 Broad Street Cholera outbreak killed over 600 people in a poor district of London. What caused the outbreak?

  • What causes Cholera in general?

  • What interventions work?

Miasma theory

The immediate and chief cause of diseases is atmospheric impurity arising from decomposing remnants of the substances used for food and from the impurities given out from their own bodies. (Neil Arnott, 1844)

Snow, however, found the initial outbreak was clustered around a single water pump on Broad Street. (73 of 83 initial deaths nearer to the Broad Street pump than any other)

Snow’s map of Cholera outbreaks

The transmission of Cholera

Largely at Snow’s behest, the pump’s handle was removed, and the epidemic subsided, but does this tell us much? Outbreaks tend to subside!

Southwark and Vauxhall

  • Southwark and Vauxhall water company supplied 40,000+ homes from a reservoir that drew directly from the Thames

  • Supply had a well-established reputation for being…gross.

John Edwards “Sovereign of scented streams”

Lambeth waterworks

Lambeth waterworks, while it also drew from the Thames, moved their reservoir far upstream of the city in 1852.

The natural experiment

Water supply Cholera deaths, 1849, rate per 100,000 Cholera deaths, 1854, rate per 100,000
Southwark & Vauxhall Company only 1349 1466
Lambeth Company Only 847 193

The natural experiment

Note that the companies have different starting points (Lambeth was already cleaner even by 1849), but miasma theory might lead you to expect the same trend.

The natural experiment

If we can assume a parallel trend, then the relationship should look like this. The effect size, then, would be the difference between the counterfactual case and the observed case.

The effect of moving pumps

Water supply Cholera deaths, 1849, rate per 100,000 Cholera deaths, 1854, rate per 100,000 Difference in rates comparing 1854 to 1849, rate per 100,000
Southwark & Vauxhall Company only 1349 1466 118
Lambeth Company Only 847 193 −653
Difference-in-difference, Lambeth versus Southwark & Vauxhall 502 1273

−771

The difference in difference estimator

  • Answers the question “what would have happened to the treated units if they had not received the treatment” (average treatment effect on the treated or ATT)

    • i.e. “if Lambeth had not moved the reservoir upstream, there would have been a parallel increase in the number of cholera deaths among their customers”

    • or “But for [the treatment] the trends between treated and control units should be parallel”

    • Does not require an assumption that observations are balanced on expected values of the outcome. Unobserved confounding only matters to the extent it impacts the trend.

      • Similar to fixed effects: all time-invariant characteristics are controlled.

Assumptions

  • Parallel trends: lines would be parallel but for the treatment

    • Most important (and often the most difficult to justify)
  • Exogeneity of treatment with respect to expected trends: treatment isn’t a response to baseline outcome or expected outcomes.

  • No spillover: untreated units aren’t impacted by treatment.

  • Stable groups: the before/after populations for each group are the same

    • For panel studies, this is guaranteed, but for repeated cross-sections this is a concern because people could leave or enter the groups at different times.

OLS as DiD

For a simple 2-group x 2-time period DiD model, we can get this entire thing from a fairly simple OLS model:

\[ \hat{Y} = B_0 + B_1 \text{Time} + B_2\text{Treated} + B_3\text{Time x Treated} \]

  1. \(B_0\) The average for the control group at \(T=0\)

  2. \(B_1\) The average for the control group at \(T=1\)

  3. \(B_2\) The difference between the treated and control units at \(T=0\)

  4. \(B_3\) The difference in slopes for the treated group compared to the control group \(T=1\)

Example

library(tidyverse)
df<-data.frame(
  "period" =factor(rep(c(0, 1), 2), labels=c("before", "after")),
  "group" = factor(rep(c(0, 1), each=2), labels=c("control", "treatment")),
  "deaths" = c(1349, 1466, 847, 193)
  )

model<-lm(deaths ~ period * group , data=df)

tidy(model)|>
  select(term, estimate)
termestimate
(Intercept)1.35e+03
periodafter117       
grouptreatment-502       
periodafter:grouptreatment-771       

Interpretation

In this setup, the interaction term represents our difference-in-difference estimate

termestimate
(Intercept)1.35e+03
periodafter117       
grouptreatment-502       
periodafter:grouptreatment-771       
Water supply Deaths 1849 Deaths 1854 1854 - 1849
S & V 1349 1466 118
Lambeth 847 193 −653
DiD 502 1273 −771

Card and Krueger 1994

Do minimum wage increases reduce employment rates?

  • Minimum wage increase in April 1, 1992 in New Jersey from $4.25 to $5.05 per hour
  • Card and Krueger surveyed fast food restaurants in New Jersey and Pennsylvania in two waves:
    • Wave 1: February - March 1992 (pre-treatment)
    • Wave 2: November - December 1992 (post-treatment)

Card and Krueger 1994

Results (reproduced by Angrist and Krueger)

Card and Krueger 1994

Card and Krueger 1994

Card and Krueger 1994

How might the parallel trends assumption be violated here? Some scenarios to consider:

  • If employers in New Jersey laid off employees in anticipation of the wage increase, then the job losses might have already happened by February

  • If stores that had laid offs failed to response to the second wave of the survey, then the “after” period has a different composition compared to the first.

  • If Pennsylvania employers also raised wages in response to the New Jersey law, then the two trends aren’t really independent.

  • If New Jersey was more insulated from the national economic trends than Pennsylvania, then the the parallel trends assumption might not hold.

Card and Krueger 1994

  • Card and Krueger address some of these with alternate model specifications. Since the DiD model is essentially OLS, they can include controls for wave 1 characteristics the same way you would in a regular regression model.

  • The assumption then becomes “trends are conditionally expected to be parallel”

Card, D. (1993). Minimum Wages and Employment: A Case Study of the Fast Food Industry in New Jersey and Pennsylvania.

Card and Krueger 1994

Card and Krueger replication script

Multiple Periods and Cases

What about multiple cases or time periods? Or cases where observations are treated at different times?

For instance, what if I want to look at multiple states that passed minimum wage laws on different dates?

Two-Way Fixed Effects

The difference-in-difference model is often generalized* to multiple groups/multiple periods by using a fixed effect for each group/time in place of the indicator for control vs. treatment cases:

\[ \hat{Y}_{gt} = \alpha_g + \gamma_t + \delta X_{gt} \] \[ \alpha_g = \text{Group Fixed Effect} \]

\[ \gamma_t = \text{Time Fixed Effect} \] \[ \delta_{gt} = \text{Post Treatment Indicator} \]

* =prepare for caveats on this!

Two-Way Fixed Effects Alternatives

  • This isn’t really equivalent to the difference-in-differences model if there are multiple time periods and or differences in treatment timings See: Imai, K., & Kim, I. S. (2021) and won’t estimate a causal effect even if the parallel trends assumption holds in most situations

  • But we have some alternative methods!

Considerations

  • Is the parallel trends assumption plausible? The best research will attempt to justify this assumption using multiple lines of evidence like:

    • Placebo tests

    • Examining multiple time periods to see if trends are parallel long before some outcome.

  • 2 way fixed effects models might not be equivalent to the difference-in-differences method, so more recent analyses should account for this.